Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 20 de 24
Filtrar
1.
Rev. panam. salud pública ; 48: e13, 2024. tab, graf
Artículo en Español | LILACS-Express | LILACS | ID: biblio-1536672

RESUMEN

resumen está disponible en el texto completo


ABSTRACT The CONSORT 2010 statement provides minimum guidelines for reporting randomized trials. Its widespread use has been instrumental in ensuring transparency in the evaluation of new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate impact on health outcomes. The CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence) extension is a new reporting guideline for clinical trials evaluating interventions with an AI component. It was developed in parallel with its companion statement for clinical trial protocols: SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence). Both guidelines were developed through a staged consensus process involving literature review and expert consultation to generate 29 candidate items, which were assessed by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed upon in a two-day consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants). The CONSORT-AI extension includes 14 new items that were considered sufficiently important for AI interventions that they should be routinely reported in addition to the core CONSORT 2010 items. CONSORT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention is integrated, the handling of inputs and outputs of the AI intervention, the human-AI interaction and provision of an analysis of error cases. CONSORT-AI will help promote transparency and completeness in reporting clinical trials for AI interventions. It will assist editors and peer reviewers, as well as the general readership, to understand, interpret and critically appraise the quality of clinical trial design and risk of bias in the reported outcomes.


RESUMO A declaração CONSORT 2010 apresenta diretrizes mínimas para relatórios de ensaios clínicos randomizados. Seu uso generalizado tem sido fundamental para garantir a transparência na avaliação de novas intervenções. Recentemente, tem-se reconhecido cada vez mais que intervenções que incluem inteligência artificial (IA) precisam ser submetidas a uma avaliação rigorosa e prospectiva para demonstrar seus impactos sobre os resultados de saúde. A extensão CONSORT-AI (Consolidated Standards of Reporting Trials - Artificial Intelligence) é uma nova diretriz para relatórios de ensaios clínicos que avaliam intervenções com um componente de IA. Ela foi desenvolvida em paralelo à sua declaração complementar para protocolos de ensaios clínicos, a SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials - Artificial Intelligence). Ambas as diretrizes foram desenvolvidas por meio de um processo de consenso em etapas que incluiu revisão da literatura e consultas a especialistas para gerar 29 itens candidatos. Foram feitas consultas sobre esses itens a um grupo internacional composto por 103 interessados diretos, que participaram de uma pesquisa Delphi em duas etapas. Chegou-se a um acordo sobre os itens em uma reunião de consenso que incluiu 31 interessados diretos, e os itens foram refinados por meio de uma lista de verificação piloto que envolveu 34 participantes. A extensão CONSORT-AI inclui 14 itens novos que, devido à sua importância para as intervenções de IA, devem ser informados rotineiramente juntamente com os itens básicos da CONSORT 2010. A CONSORT-AI preconiza que os pesquisadores descrevam claramente a intervenção de IA, incluindo instruções e as habilidades necessárias para seu uso, o contexto no qual a intervenção de IA está inserida, considerações sobre o manuseio dos dados de entrada e saída da intervenção de IA, a interação humano-IA e uma análise dos casos de erro. A CONSORT-AI ajudará a promover a transparência e a integralidade nos relatórios de ensaios clínicos com intervenções que utilizam IA. Seu uso ajudará editores e revisores, bem como leitores em geral, a entender, interpretar e avaliar criticamente a qualidade do desenho do ensaio clínico e o risco de viés nos resultados relatados.

2.
Rev. panam. salud pública ; 48: e12, 2024. tab, graf
Artículo en Español | LILACS-Express | LILACS | ID: biblio-1536674

RESUMEN

resumen está disponible en el texto completo


ABSTRACT The SPIRIT 2013 statement aims to improve the completeness of clinical trial protocol reporting by providing evidence-based recommendations for the minimum set of items to be addressed. This guidance has been instrumental in promoting transparent evaluation of new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate their impact on health outcomes. The SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence) extension is a new reporting guideline for clinical trial protocols evaluating interventions with an AI component. It was developed in parallel with its companion statement for trial reports: CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence). Both guidelines were developed through a staged consensus process involving literature review and expert consultation to generate 26 candidate items, which were consulted upon by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed upon in a consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants). The SPIRIT-AI extension includes 15 new items that were considered sufficiently important for clinical trial protocols of AI interventions. These new items should be routinely reported in addition to the core SPIRIT 2013 items. SPIRIT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention will be integrated, considerations for the handling of input and output data, the human-AI interaction and analysis of error cases. SPIRIT-AI will help promote transparency and completeness for clinical trial protocols for AI interventions. Its use will assist editors and peer reviewers, as well as the general readership, to understand, interpret and critically appraise the design and risk of bias for a planned clinical trial.


RESUMO A declaração SPIRIT 2013 tem como objetivo melhorar a integralidade dos relatórios dos protocolos de ensaios clínicos, fornecendo recomendações baseadas em evidências para o conjunto mínimo de itens que devem ser abordados. Essas orientações têm sido fundamentais para promover uma avaliação transparente de novas intervenções. Recentemente, tem-se reconhecido cada vez mais que intervenções que incluem inteligência artificial (IA) precisam ser submetidas a uma avaliação rigorosa e prospectiva para demonstrar seus impactos sobre os resultados de saúde. A extensão SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials - Artificial Intelligence) é uma nova diretriz de relatório para protocolos de ensaios clínicos que avaliam intervenções com um componente de IA. Essa diretriz foi desenvolvida em paralelo à sua declaração complementar para relatórios de ensaios clínicos, CONSORT-AI (Consolidated Standards of Reporting Trials - Artificial Intelligence). Ambas as diretrizes foram desenvolvidas por meio de um processo de consenso em etapas que incluiu revisão da literatura e consultas a especialistas para gerar 26 itens candidatos. Foram feitas consultas sobre esses itens a um grupo internacional composto por 103 interessados diretos, que participaram de uma pesquisa Delphi em duas etapas. Chegou-se a um acordo sobre os itens em uma reunião de consenso que incluiu 31 interessados diretos, e os itens foram refinados por meio de uma lista de verificação piloto que envolveu 34 participantes. A extensão SPIRIT-AI inclui 15 itens novos que foram considerados suficientemente importantes para os protocolos de ensaios clínicos com intervenções que utilizam IA. Esses itens novos devem constar dos relatórios de rotina, juntamente com os itens básicos da SPIRIT 2013. A SPIRIT-AI preconiza que os pesquisadores descrevam claramente a intervenção de IA, incluindo instruções e as habilidades necessárias para seu uso, o contexto no qual a intervenção de IA será integrada, considerações sobre o manuseio dos dados de entrada e saída, a interação humano-IA e a análise de casos de erro. A SPIRIT-AI ajudará a promover a transparência e a integralidade nos protocolos de ensaios clínicos com intervenções que utilizam IA. Seu uso ajudará editores e revisores, bem como leitores em geral, a entender, interpretar e avaliar criticamente o delineamento e o risco de viés de um futuro estudo clínico.

3.
Rev. panam. salud pública ; 47: e149, 2023. tab, graf
Artículo en Español | LILACS-Express | LILACS | ID: biblio-1536665

RESUMEN

resumen está disponible en el texto completo


ABSTRACT The SPIRIT 2013 statement aims to improve the completeness of clinical trial protocol reporting by providing evidence-based recommendations for the minimum set of items to be addressed. This guidance has been instrumental in promoting transparent evaluation of new interventions. More recently, there has been a growing recognition that interventions involving artificial intelligence (AI) need to undergo rigorous, prospective evaluation to demonstrate their impact on health outcomes. The SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials-Artificial Intelligence) extension is a new reporting guideline for clinical trial protocols evaluating interventions with an AI component. It was developed in parallel with its companion statement for trial reports: CONSORT-AI (Consolidated Standards of Reporting Trials-Artificial Intelligence). Both guidelines were developed through a staged consensus process involving literature review and expert consultation to generate 26 candidate items, which were consulted upon by an international multi-stakeholder group in a two-stage Delphi survey (103 stakeholders), agreed upon in a consensus meeting (31 stakeholders) and refined through a checklist pilot (34 participants). The SPIRIT-AI extension includes 15 new items that were considered sufficiently important for clinical trial protocols of AI interventions. These new items should be routinely reported in addition to the core SPIRIT 2013 items. SPIRIT-AI recommends that investigators provide clear descriptions of the AI intervention, including instructions and skills required for use, the setting in which the AI intervention will be integrated, considerations for the handling of input and output data, the human-AI interaction and analysis of error cases. SPIRIT-AI will help promote transparency and completeness for clinical trial protocols for AI interventions. Its use will assist editors and peer reviewers, as well as the general readership, to understand, interpret and critically appraise the design and risk of bias for a planned clinical trial.


RESUMO A declaração SPIRIT 2013 tem como objetivo melhorar a integralidade dos relatórios dos protocolos de ensaios clínicos, fornecendo recomendações baseadas em evidências para o conjunto mínimo de itens que devem ser abordados. Essas orientações têm sido fundamentais para promover uma avaliação transparente de novas intervenções. Recentemente, tem-se reconhecido cada vez mais que intervenções que incluem inteligência artificial (IA) precisam ser submetidas a uma avaliação rigorosa e prospectiva para demonstrar seus impactos sobre os resultados de saúde. A extensão SPIRIT-AI (Standard Protocol Items: Recommendations for Interventional Trials - Artificial Intelligence) é uma nova diretriz de relatório para protocolos de ensaios clínicos que avaliam intervenções com um componente de IA. Essa diretriz foi desenvolvida em paralelo à sua declaração complementar para relatórios de ensaios clínicos, CONSORT-AI (Consolidated Standards of Reporting Trials - Artificial Intelligence). Ambas as diretrizes foram desenvolvidas por meio de um processo de consenso em etapas que incluiu revisão da literatura e consultas a especialistas para gerar 26 itens candidatos. Foram feitas consultas sobre esses itens a um grupo internacional composto por 103 interessados diretos, que participaram de uma pesquisa Delphi em duas etapas. Chegou-se a um acordo sobre os itens em uma reunião de consenso que incluiu 31 interessados diretos, e os itens foram refinados por meio de uma lista de verificação piloto que envolveu 34 participantes. A extensão SPIRIT-AI inclui 15 itens novos que foram considerados suficientemente importantes para os protocolos de ensaios clínicos com intervenções que utilizam IA. Esses itens novos devem constar dos relatórios de rotina, juntamente com os itens básicos da SPIRIT 2013. A SPIRIT-AI preconiza que os pesquisadores descrevam claramente a intervenção de IA, incluindo instruções e as habilidades necessárias para seu uso, o contexto no qual a intervenção de IA será integrada, considerações sobre o manuseio dos dados de entrada e saída, a interação humano-IA e a análise de casos de erro. A SPIRIT-AI ajudará a promover a transparência e a integralidade nos protocolos de ensaios clínicos com intervenções que utilizam IA. Seu uso ajudará editores e revisores, bem como leitores em geral, a entender, interpretar e avaliar criticamente o delineamento e o risco de viés de um futuro estudo clínico.

4.
J Clin Neurosci ; 96: 80-84, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-34999495

RESUMEN

Machine learning may be able to help with predicting factors that aid in discharge planning for stroke patients. This study aims to validate previously derived models, on external and prospective datasets, for the prediction of discharge modified Rankin scale (mRS), discharge destination, survival to discharge and length of stay. Data were collected from consecutive patients admitted with ischaemic or haemorrhagic stroke at the Royal Adelaide Hospital from September 2019 to January 2020, and at the Lyell McEwin Hospital from January 2017 to January 2020. The previously derived models were then applied to these datasets with three pre-defined cut-off scores (high-sensitivity, Youden's index, and high-specificity) to return indicators of performance including area under the receiver operator curve (AUC), sensitivity and specificity. The number of individuals included in the prospective and external datasets were 334 and 824 respectively. The models performed well on both the prospective and external datasets in the prediction of discharge mRS ≤ 2 (AUC 0.85 and 0.87), discharge destination to home (AUC 0.76 and 0.78) and survival to discharge (AUC 0.91 and 0.92). Accurate prediction of length of stay with only admission data remains difficult (AUC 0.62 and 0.66). This study demonstrates successful prospective and external validation of machine learning models using six variables to predict information relevant to discharge planning for stroke patients. Further research is required to demonstrate patient or system benefits following implementation of these models.


Asunto(s)
Alta del Paciente , Accidente Cerebrovascular , Hospitalización , Humanos , Aprendizaje Automático , Estudios Prospectivos , Accidente Cerebrovascular/diagnóstico , Accidente Cerebrovascular/terapia
5.
Intern Med J ; 52(2): 176-185, 2022 Feb.
Artículo en Inglés | MEDLINE | ID: mdl-33094899

RESUMEN

Length of stay (LOS) estimates are important for patients, doctors and hospital administrators. However, making accurate estimates of LOS can be difficult for medical patients. This review was conducted with the aim of identifying and assessing previous studies on the application of machine learning to the prediction of total hospital inpatient LOS for medical patients. A review of machine learning in the prediction of total hospital LOS for medical inpatients was conducted using the databases PubMed, EMBASE and Web of Science. Of the 673 publications returned by the initial search, 21 articles met inclusion criteria. Of these articles the most commonly represented medical specialty was cardiology. Studies were also identified that had specifically evaluated machine learning LOS prediction in patients with diabetes and tuberculosis. The performance of the machine learning models in the identified studies varied significantly depending on factors including differing input datasets and different LOS thresholds and outcome metrics. Common methodological shortcomings included a lack of reporting of patient demographics and lack of reporting of clinical details of included patients. The variable performance reported by the studies identified in this review supports the need for further research of the utility of machine learning in the prediction of total inpatient LOS in medical patients. Future studies should follow and report a more standardised methodology to better assess performance and to allow replication and validation. In particular, prospective validation studies and studies assessing the clinical impact of such machine learning models would be beneficial.


Asunto(s)
Pacientes Internos , Aprendizaje Automático , Bases de Datos Factuales , Predicción , Humanos , Tiempo de Internación
6.
J Med Imaging Radiat Oncol ; 66(3): 319-323, 2022 Apr.
Artículo en Inglés | MEDLINE | ID: mdl-34250746

RESUMEN

INTRODUCTION: Prostate cancer diagnosis is shifting towards a minimally invasive approach, maintaining accuracy and efficacy while reducing morbidity. We aimed to assess if PSMA-Ga68 PET/CT can accurately grade and localise prostatic malignancy using objective methods, compared with pathology and MRI. METHODS: Retrospective analysis on 114 consecutive patients undergoing staging PSMA PET/CT scans over 12 months was carried out. The SUVmax and site of highest PSMA activity within the prostate gland were recorded. Pathology/biopsy review assessed maximum Gleason score (and location). MRI analysis assessed the highest PIRADS score and location. The grade, location and size of malignant tissue on biopsy, and PSA, were correlated with the SUVmax and the PIRADS score. RESULTS: SUVmax was significantly elevated in cases with PSA ≥10 (P = 0.003) and Gleason score ≥8 (P = 0.0002). SUVmax demonstrated equivalent sensitivity to MRI-PIRADS in predicting Gleason ≥8 disease, with higher specificity when tested under a high-specificity regime (SUVmax ≥10, PIRADS = 5, P = 0.002). Furthermore, the region of highest SUVmax was superior to MRI-PIRADS for localising the highest grade tumour region, correctly identifying 71% of highest grade regions compared to 54% with MRI (P = 0.015). CONCLUSION: PSMA PET/CT is as effective as MRI in identifying high-grade prostate malignancy. Our findings also support previous studies in showing a significant relationship between SUVmax and Gleason grade. These benefits, along with the known advantage in identifying distant metastases and the reduced cost, further support the argument that PSMA PET/CT should be offered as an initial investigation in the workup of prostate cancer.


Asunto(s)
Tomografía Computarizada por Tomografía de Emisión de Positrones , Neoplasias de la Próstata , Humanos , Imagen por Resonancia Magnética , Masculino , Tomografía Computarizada por Tomografía de Emisión de Positrones/métodos , Antígeno Prostático Específico , Neoplasias de la Próstata/diagnóstico por imagen , Neoplasias de la Próstata/patología , Estudios Retrospectivos
7.
BMJ Open ; 11(12): e052902, 2021 12 20.
Artículo en Inglés | MEDLINE | ID: mdl-34930738

RESUMEN

OBJECTIVES: Artificial intelligence (AI) algorithms have been developed to detect imaging features on chest X-ray (CXR) with a comprehensive AI model capable of detecting 124 CXR findings being recently developed. The aim of this study was to evaluate the real-world usefulness of the model as a diagnostic assistance device for radiologists. DESIGN: This prospective real-world multicentre study involved a group of radiologists using the model in their daily reporting workflow to report consecutive CXRs and recording their feedback on level of agreement with the model findings and whether this significantly affected their reporting. SETTING: The study took place at radiology clinics and hospitals within a large radiology network in Australia between November and December 2020. PARTICIPANTS: Eleven consultant diagnostic radiologists of varying levels of experience participated in this study. PRIMARY AND SECONDARY OUTCOME MEASURES: Proportion of CXR cases where use of the AI model led to significant material changes to the radiologist report, to patient management, or to imaging recommendations. Additionally, level of agreement between radiologists and the model findings, and radiologist attitudes towards the model were assessed. RESULTS: Of 2972 cases reviewed with the model, 92 cases (3.1%) had significant report changes, 43 cases (1.4%) had changed patient management and 29 cases (1.0%) had further imaging recommendations. In terms of agreement with the model, 2569 cases showed complete agreement (86.5%). 390 (13%) cases had one or more findings rejected by the radiologist. There were 16 findings across 13 cases (0.5%) deemed to be missed by the model. Nine out of 10 radiologists felt their accuracy was improved with the model and were more positive towards AI poststudy. CONCLUSIONS: Use of an AI model in a real-world reporting environment significantly improved radiologist reporting and showed good agreement with radiologists, highlighting the potential for AI diagnostic support to improve clinical practice.


Asunto(s)
Inteligencia Artificial , Aprendizaje Profundo , Algoritmos , Humanos , Estudios Prospectivos , Radiólogos
8.
BMJ Open ; 11(12): e053024, 2021 12 07.
Artículo en Inglés | MEDLINE | ID: mdl-34876430

RESUMEN

OBJECTIVES: To evaluate the ability of a commercially available comprehensive chest radiography deep convolutional neural network (DCNN) to detect simple and tension pneumothorax, as stratified by the following subgroups: the presence of an intercostal drain; rib, clavicular, scapular or humeral fractures or rib resections; subcutaneous emphysema and erect versus non-erect positioning. The hypothesis was that performance would not differ significantly in each of these subgroups when compared with the overall test dataset. DESIGN: A retrospective case-control study was undertaken. SETTING: Community radiology clinics and hospitals in Australia and the USA. PARTICIPANTS: A test dataset of 2557 chest radiography studies was ground-truthed by three subspecialty thoracic radiologists for the presence of simple or tension pneumothorax as well as each subgroup other than positioning. Radiograph positioning was derived from radiographer annotations on the images. OUTCOME MEASURES: DCNN performance for detecting simple and tension pneumothorax was evaluated over the entire test set, as well as within each subgroup, using the area under the receiver operating characteristic curve (AUC). A difference in AUC of more than 0.05 was considered clinically significant. RESULTS: When compared with the overall test set, performance of the DCNN for detecting simple and tension pneumothorax was statistically non-inferior in all subgroups. The DCNN had an AUC of 0.981 (0.976-0.986) for detecting simple pneumothorax and 0.997 (0.995-0.999) for detecting tension pneumothorax. CONCLUSIONS: Hidden stratification has significant implications for potential failures of deep learning when applied in clinical practice. This study demonstrated that a comprehensively trained DCNN can be resilient to hidden stratification in several clinically meaningful subgroups in detecting pneumothorax.


Asunto(s)
Aprendizaje Profundo , Neumotórax , Algoritmos , Estudios de Casos y Controles , Humanos , Neumotórax/diagnóstico por imagen , Radiografía , Radiografía Torácica/métodos , Estudios Retrospectivos
9.
Lancet Digit Health ; 3(11): e745-e750, 2021 11.
Artículo en Inglés | MEDLINE | ID: mdl-34711379

RESUMEN

The black-box nature of current artificial intelligence (AI) has caused some to question whether AI must be explainable to be used in high-stakes scenarios such as medicine. It has been argued that explainable AI will engender trust with the health-care workforce, provide transparency into the AI decision making process, and potentially mitigate various kinds of bias. In this Viewpoint, we argue that this argument represents a false hope for explainable AI and that current explainability methods are unlikely to achieve these goals for patient-level decision support. We provide an overview of current explainability techniques and highlight how various failure cases can cause problems for decision making for individual patients. In the absence of suitable explainability methods, we advocate for rigorous internal and external validation of AI models as a more direct means of achieving the goals often associated with explainability, and we caution against having explainability be a requirement for clinically deployed models.


Asunto(s)
Inteligencia Artificial , Comunicación , Comprensión , Atención a la Salud/métodos , Disentimientos y Disputas , Confianza , Sesgo , Toma de Decisiones , Diagnóstico por Imagen , Personal de Salud , Humanos , Modelos Biológicos
10.
Intern Med J ; 51(9): 1539-1542, 2021 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-34541769

RESUMEN

To utilise effectively tools that employ machine learning (ML) in clinical practice medical students and doctors will require a degree of understanding of ML models. To evaluate current levels of understanding, a formative examination and survey was conducted across three centres in Australia, New Zealand and the United States. Of the 245 individuals who participated in the study (response rate = 45.4%), the majority had difficulty with identifying weaknesses in model performance analysis. Further studies examining educational interventions addressing such ML topics are warranted.


Asunto(s)
Educación de Pregrado en Medicina , Estudiantes de Medicina , Australia/epidemiología , Estudios Transversales , Curriculum , Humanos , Aprendizaje Automático , Estados Unidos
11.
Lancet Digit Health ; 3(8): e496-e506, 2021 08.
Artículo en Inglés | MEDLINE | ID: mdl-34219054

RESUMEN

BACKGROUND: Chest x-rays are widely used in clinical practice; however, interpretation can be hindered by human error and a lack of experienced thoracic radiologists. Deep learning has the potential to improve the accuracy of chest x-ray interpretation. We therefore aimed to assess the accuracy of radiologists with and without the assistance of a deep-learning model. METHODS: In this retrospective study, a deep-learning model was trained on 821 681 images (284 649 patients) from five data sets from Australia, Europe, and the USA. 2568 enriched chest x-ray cases from adult patients (≥16 years) who had at least one frontal chest x-ray were included in the test dataset; cases were representative of inpatient, outpatient, and emergency settings. 20 radiologists reviewed cases with and without the assistance of the deep-learning model with a 3-month washout period. We assessed the change in accuracy of chest x-ray interpretation across 127 clinical findings when the deep-learning model was used as a decision support by calculating area under the receiver operating characteristic curve (AUC) for each radiologist with and without the deep-learning model. We also compared AUCs for the model alone with those of unassisted radiologists. If the lower bound of the adjusted 95% CI of the difference in AUC between the model and the unassisted radiologists was more than -0·05, the model was considered to be non-inferior for that finding. If the lower bound exceeded 0, the model was considered to be superior. FINDINGS: Unassisted radiologists had a macroaveraged AUC of 0·713 (95% CI 0·645-0·785) across the 127 clinical findings, compared with 0·808 (0·763-0·839) when assisted by the model. The deep-learning model statistically significantly improved the classification accuracy of radiologists for 102 (80%) of 127 clinical findings, was statistically non-inferior for 19 (15%) findings, and no findings showed a decrease in accuracy when radiologists used the deep-learning model. Unassisted radiologists had a macroaveraged mean AUC of 0·713 (0·645-0·785) across all findings, compared with 0·957 (0·954-0·959) for the model alone. Model classification alone was significantly more accurate than unassisted radiologists for 117 (94%) of 124 clinical findings predicted by the model and was non-inferior to unassisted radiologists for all other clinical findings. INTERPRETATION: This study shows the potential of a comprehensive deep-learning model to improve chest x-ray interpretation across a large breadth of clinical practice. FUNDING: Annalise.ai.


Asunto(s)
Aprendizaje Profundo , Tamizaje Masivo/métodos , Modelos Biológicos , Interpretación de Imagen Radiográfica Asistida por Computador , Radiografía Torácica , Rayos X , Adolescente , Adulto , Anciano , Anciano de 80 o más Años , Área Bajo la Curva , Inteligencia Artificial , Femenino , Humanos , Infecciones/diagnóstico , Infecciones/diagnóstico por imagen , Masculino , Persona de Mediana Edad , Curva ROC , Radiólogos , Estudios Retrospectivos , Traumatismos Torácicos/diagnóstico , Traumatismos Torácicos/diagnóstico por imagen , Neoplasias Torácicas/diagnóstico , Neoplasias Torácicas/diagnóstico por imagen , Adulto Joven
12.
J Med Imaging Radiat Oncol ; 65(5): 538-544, 2021 Aug.
Artículo en Inglés | MEDLINE | ID: mdl-34169648

RESUMEN

Despite its simple acquisition technique, the chest X-ray remains the most common first-line imaging tool for chest assessment globally. Recent evidence for image analysis using modern machine learning points to possible improvements in both the efficiency and the accuracy of chest X-ray interpretation. While promising, these machine learning algorithms have not provided comprehensive assessment of findings in an image and do not account for clinical history or other relevant clinical information. However, the rapid evolution in technology and evidence base for its use suggests that the next generation of comprehensive, well-tested machine learning algorithms will be a revolution akin to early advances in X-ray technology. Current use cases, strengths, limitations and applications of chest X-ray machine learning systems are discussed.


Asunto(s)
Aprendizaje Automático , Humanos , Procesamiento de Imagen Asistido por Computador , Radiografía , Tórax
13.
Sci Rep ; 11(1): 5193, 2021 03 04.
Artículo en Inglés | MEDLINE | ID: mdl-33664367

RESUMEN

Artificial intelligence technology has advanced rapidly in recent years and has the potential to improve healthcare outcomes. However, technology uptake will be largely driven by clinicians, and there is a paucity of data regarding the attitude that clinicians have to this new technology. In June-August 2019 we conducted an online survey of fellows and trainees of three specialty colleges (ophthalmology, radiology/radiation oncology, dermatology) in Australia and New Zealand on artificial intelligence. There were 632 complete responses (n = 305, 230, and 97, respectively), equating to a response rate of 20.4%, 5.1%, and 13.2% for the above colleges, respectively. The majority (n = 449, 71.0%) believed artificial intelligence would improve their field of medicine, and that medical workforce needs would be impacted by the technology within the next decade (n = 542, 85.8%). Improved disease screening and streamlining of monotonous tasks were identified as key benefits of artificial intelligence. The divestment of healthcare to technology companies and medical liability implications were the greatest concerns. Education was identified as a priority to prepare clinicians for the implementation of artificial intelligence in healthcare. This survey highlights parallels between the perceptions of different clinician groups in Australia and New Zealand about artificial intelligence in medicine. Artificial intelligence was recognized as valuable technology that will have wide-ranging impacts on healthcare.

14.
Artículo en Inglés | MEDLINE | ID: mdl-33196064

RESUMEN

Machine learning models for medical image analysis often suffer from poor performance on important subsets of a population that are not identified during training or testing. For example, overall performance of a cancer detection model may be high, but the model may still consistently miss a rare but aggressive cancer subtype. We refer to this problem as hidden stratification, and observe that it results from incompletely describing the meaningful variation in a dataset. While hidden stratification can substantially reduce the clinical efficacy of machine learning models, its effects remain difficult to measure. In this work, we assess the utility of several possible techniques for measuring hidden stratification effects, and characterize these effects both via synthetic experiments on the CIFAR-100 benchmark dataset and on multiple real-world medical imaging datasets. Using these measurement techniques, we find evidence that hidden stratification can occur in unidentified imaging subsets with low prevalence, low label quality, subtle distinguishing features, or spurious correlates, and that it can result in relative performance differences of over 20% on clinically important subsets. Finally, we discuss the clinical implications of our findings, and suggest that evaluation of hidden stratification should be a critical component of any machine learning deployment in medical imaging.

15.
J Clin Neurosci ; 79: 100-103, 2020 Sep.
Artículo en Inglés | MEDLINE | ID: mdl-33070874

RESUMEN

Post-stroke discharge planning may be aided by accurate early prognostication. Machine learning may be able to assist with such prognostication. The study's primary aim was to evaluate the performance of machine learning models using admission data to predict the likely length of stay (LOS) for patients admitted with stroke. Secondary aims included the prediction of discharge modified Rankin Scale (mRS), in-hospital mortality, and discharge destination. In this study a retrospective dataset was used to develop and test a variety of machine learning models. The patients included in the study were all stroke admissions (both ischaemic stroke and intracerebral haemorrhage) at a single tertiary hospital between December 2016 and September 2019. The machine learning models developed and tested (75%/25% train/test split) included logistic regression, random forests, decision trees and artificial neural networks. The study included 2840 patients. In LOS prediction the highest area under the receiver operator curve (AUC) was achieved on the unseen test dataset by an artificial neural network at 0.67. Higher AUC were achieved using logistic regression models in the prediction of discharge functional independence (mRS ≤2) (AUC 0.90) and in the prediction of in-hospital mortality (AUC 0.90). Logistic regression was also the best performing model for predicting home vs non-home discharge destination (AUC 0.81). This study indicates that machine learning may aid in the prognostication of factors relevant to post-stroke discharge planning. Further prospective and external validation is required, as well as assessment of the impact of subsequent implementation.


Asunto(s)
Tiempo de Internación , Aprendizaje Automático , Alta del Paciente , Pronóstico , Accidente Cerebrovascular , Femenino , Humanos , Masculino , Persona de Mediana Edad , Redes Neurales de la Computación , Estudios Retrospectivos
16.
17.
Acad Radiol ; 27(2): e19-e23, 2020 02.
Artículo en Inglés | MEDLINE | ID: mdl-31053480

RESUMEN

RATIONALE AND OBJECTIVES: Intravenous thrombolysis decision-making and obtaining of consent would be assisted by an individualized risk-benefit ratio. Deep learning (DL) models may be able to assist with this patient selection. MATERIALS AND METHODS: Clinical data regarding consecutive patients who received intravenous thrombolysis across two tertiary hospitals over a 7-year period were extracted from existing databases. The noncontrast computed tomography brain scans for these patients were then retrieved with hospital picture archiving and communication systems. Using a combination of convolutional neural networks (CNN) and artificial neural networks (ANN) several models were developed to predict either improvement in the National Institutes of Health Stroke Scale of ≥4 points at 24 hours ("NIHSS24"), or modified Rankin Scale 0-1 at 90 days ("mRS90"). The developed CNN and ANN were then applied to a test set. The THRIVE, HIAT, and SPAN-100 scores were also calculated for the patients in the test set and used to predict NIHSS24 and mRS90. RESULTS: Data from 204 individuals were included in the project. The best performing DL model for prediction of mRS90 was a combination CNN + ANN based on clinical data and computed tomography brain (accuracy = 0.74, F1 score = 0.69). The best performing model for NIHSS24 prediction was also the combination CNN + ANN (accuracy = 0.71, F1 score = 0.74). CONCLUSION: DL models may aid in the prediction of functional thrombolysis outcomes. Further investigation with larger datasets and additional imaging sequences is indicated.


Asunto(s)
Isquemia Encefálica , Aprendizaje Profundo , Accidente Cerebrovascular , Isquemia Encefálica/diagnóstico por imagen , Isquemia Encefálica/tratamiento farmacológico , Humanos , Proyectos Piloto , Accidente Cerebrovascular/diagnóstico por imagen , Accidente Cerebrovascular/tratamiento farmacológico , Terapia Trombolítica
18.
Acad Radiol ; 27(1): 106-112, 2020 01.
Artículo en Inglés | MEDLINE | ID: mdl-31706792

RESUMEN

RATIONALE AND OBJECTIVES: Medical artificial intelligence systems are dependent on well characterized large-scale datasets. Recently released public datasets have been of great interest to the field, but pose specific challenges due to the disconnect they cause between data generation and data usage, potentially limiting the utility of these datasets. MATERIALS AND METHODS: We visually explore two large public datasets, to determine how accurate the provided labels are and whether other subtle problems exist. The ChestXray14 dataset contains 112,120 frontal chest films, and the Musculoskeletal Radiology (MURA) dataset contains 40,561 upper limb radiographs. A subset of around 700 images from both datasets was reviewed by a board-certified radiologist, and the quality of the original labels was determined. RESULTS: The ChestXray14 labels did not accurately reflect the visual content of the images, with positive predictive values mostly between 10% and 30% lower than the values presented in the original documentation. There were other significant problems, with examples of hidden stratification and label disambiguation failure. The MURA labels were more accurate, but the original normal/abnormal labels were inaccurate for the subset of cases with degenerative joint disease, with a sensitivity of 60% and a specificity of 82%. CONCLUSION: Visual inspection of images is a necessary component of understanding large image datasets. We recommend that teams producing public datasets should perform this important quality control procedure and include a thorough description of their findings, along with an explanation of the data generating procedures and labeling rules, in the documentation for their datasets.


Asunto(s)
Inteligencia Artificial , Radiología , Documentación , Humanos , Radiólogos , Sensibilidad y Especificidad
19.
J Clin Neurosci ; 70: 11-13, 2019 Dec.
Artículo en Inglés | MEDLINE | ID: mdl-31648967

RESUMEN

The identification of high-grade glioma (HGG) progression may pose a diagnostic dilemma due to similar appearances of treatment-related changes (TRC) (e.g. pseudoprogression or radionecrosis). Deep learning (DL) may be able to assist with this task. MRI scans from consecutive patients with histologically confirmed HGG (grade 3 or 4) were reviewed. Scans for which recurrence or TRC was queried were followed up to determine whether the cases indicated recurrence/progression or TRC. Identified cases were randomly split into training and testing sets (80%/20%). Following development on the training set, classification experiments using convolutional neural networks (CNN) were then conducted using models based on each of diffusion weighted imaging (DWI - isotropic diffusion map), apparent diffusion coefficient (ADC), FLAIR and post-contrast T1 sequences. The sequence that achieved the highest accuracy on the test set was then used to develop DL models in which multiple sequences were combined. MRI scans from 55 patients were included in the study (70.1% progression/recurrence). 54.5% of the randomly allocated test set had progression/recurrence. Based upon DWI sequences the CNN achieved an accuracy of 0.73 (F1 score = 0.67). The model based on the DWI+FLAIR sequences in combination achieved an accuracy of 0.82 (F1 score = 0.86). The results of this study support similar studies that have shown that machine learning, in particular DL, may be useful in distinguishing progression/recurrence from TRC. Further studies examining the accuracy of DL models, including magnetic resonance perfusion (MRP) and magnetic resonance spectroscopy (MRS), with larger sample sizes may be beneficial.


Asunto(s)
Neoplasias Encefálicas/diagnóstico por imagen , Aprendizaje Profundo , Glioma/diagnóstico por imagen , Interpretación de Imagen Asistida por Computador/métodos , Imagen por Resonancia Magnética/métodos , Recurrencia Local de Neoplasia/diagnóstico por imagen , Adulto , Anciano , Neoplasias Encefálicas/patología , Femenino , Glioma/patología , Humanos , Masculino , Persona de Mediana Edad , Proyectos Piloto
20.
NPJ Digit Med ; 2: 31, 2019.
Artículo en Inglés | MEDLINE | ID: mdl-31304378

RESUMEN

Hip fractures are a leading cause of death and disability among older adults. Hip fractures are also the most commonly missed diagnosis on pelvic radiographs, and delayed diagnosis leads to higher cost and worse outcomes. Computer-aided diagnosis (CAD) algorithms have shown promise for helping radiologists detect fractures, but the image features underpinning their predictions are notoriously difficult to understand. In this study, we trained deep-learning models on 17,587 radiographs to classify fracture, 5 patient traits, and 14 hospital process variables. All 20 variables could be individually predicted from a radiograph, with the best performances on scanner model (AUC = 1.00), scanner brand (AUC = 0.98), and whether the order was marked "priority" (AUC = 0.79). Fracture was predicted moderately well from the image (AUC = 0.78) and better when combining image features with patient data (AUC = 0.86, DeLong paired AUC comparison, p = 2e-9) or patient data plus hospital process features (AUC = 0.91, p = 1e-21). Fracture prediction on a test set that balanced fracture risk across patient variables was significantly lower than a random test set (AUC = 0.67, DeLong unpaired AUC comparison, p = 0.003); and on a test set with fracture risk balanced across patient and hospital process variables, the model performed randomly (AUC = 0.52, 95% CI 0.46-0.58), indicating that these variables were the main source of the model's fracture predictions. A single model that directly combines image features, patient, and hospital process data outperforms a Naive Bayes ensemble of an image-only model prediction, patient, and hospital process data. If CAD algorithms are inexplicably leveraging patient and process variables in their predictions, it is unclear how radiologists should interpret their predictions in the context of other known patient data. Further research is needed to illuminate deep-learning decision processes so that computers and clinicians can effectively cooperate.

SELECCIÓN DE REFERENCIAS
DETALLE DE LA BÚSQUEDA
...